Scattering Representation of Modulated Sounds
نویسندگان
چکیده
Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained by their stability to deformation in a Euclidean norm. However, averaging the spectrogram loses high-frequency information. This loss is reduced by keeping the window size small, around 20 ms, which in turn prevents MFSCs from capturing largescale structures. Scattering coefficients recover part of this lost information using a cascade of wavelet decompositions and modulus operators, enabling larger window sizes. This representation is sufficiently rich to capture note attacks, amplitude and frequency modulation, as well as chord structure.
منابع مشابه
The Representation of Non-Linguistic Sounds in Persian and English Subtitles for the Deaf and Hard-of-Hearing: A Comparative Study
Subtitling for the deaf and hard-of-hearing (SDH) is an area which deserves a special attention as it ena- bles these people to access to the part of the ‘world’ intended for hearing people, including the world of ‘motion pictures’, and particularly movie sounds. Compared to linguistic sounds, non-linguistic sounds have received little attention in the field of translation, although they are in...
متن کاملTransformée en scattering sur la spirale temps-chroma-octave
We introduce a scattering representation for the analysis and classification of sounds. It is locally translation-invariant, stable to deformations in time and frequency, and has the ability to capture harmonic structures. The scattering representation can be interpreted as a convolutional neural network which cascades a wavelet transform in time and along a harmonic spiral. We study its applic...
متن کاملمحاسبه سطح مقطع پراکندگی تفکیک پروتون- دوترون در انرژیهای میانی
In this paper, we reformulate three-nucleon breakup scattering in leading order approximation by considering spin-isospin degrees of freedom. At first, considering the inhomogeneous part of Faddeev equation, which is a valid approximation in high and intermediate energles, we present the Faddeev equation as a function of vector Jacobi momenta and spin and isospin quantum numbers. In this new fo...
متن کاملSeparate neural systems for processing action- or non-action-related sounds.
The finding of a multisensory representation of actions in a premotor area of the monkey brain suggests that similar multimodal action-matching mechanisms may also be present in humans. Based on the existence of an audiovisual mirror system, we investigated whether sounds referring to actions that can be performed by the perceiver underlie different processing in the human brain. We recorded mu...
متن کاملWavelet Scattering on the Pitch Spiral
We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in...
متن کامل